Detecting High-Dimensional Outliers: the New Task, Algorithms and Performance
نویسندگان
چکیده
Outlier detection is a fundamental step in knowledge discovery in databases. With the increasing number of high-dimensional databases, existing outlier detection algorithms that work only in the context of full space are unable to effectively screen out informative outliers. This is because majority of these outliers exists only in subspaces. In this paper, we identify a new outlier detection task for high-dimensional data, i.e. finding the subspaces in which given points are outliers, and propose a novel outlier detection algorithm, called High-D Outlier Detection (HighDOD). The intuitive idea is that we measure the outlying degree of the point using the sum of distances between this point and its k nearest neighbors. Two pruning strategies are proposed to realize fast pruning in the subspace search and an efficient dynamic subspace search method with a sample-based learning process has been implemented. Experimental results show that HighDOD is efficient and outperforms the naive top-down, bottom-up and random search methods.
منابع مشابه
Detecting Suspicious Card Transactions in unlabeled data of bank Using Outlier Detection Techniqes
With the advancement of technology, the use of ATM and credit cards are increased. Cyber fraud and theft are the kinds of threat which result in using these Technologies. It is therefore inevitable to use fraud detection algorithms to prevent fraudulent use of bank cards. Credit card fraud can be thought of as a form of identity theft that consists of an unauthorized access to another person's ...
متن کاملAn Efficient Genetic Algorithm for Task Scheduling on Heterogeneous Computing Systems Based on TRIZ
An efficient assignment and scheduling of tasks is one of the key elements in effective utilization of heterogeneous multiprocessor systems. The task scheduling problem has been proven to be NP-hard is the reason why we used meta-heuristic methods for finding a suboptimal schedule. In this paper we proposed a new approach using TRIZ (specially 40 inventive principles). The basic idea of thi...
متن کاملMammalian Eye Gene Expression Using Support Vector Regression to Evaluate a Strategy for Detecting Human Eye Disease
Background and purpose: Machine learning is a class of modern and strong tools that can solve many important problems that nowadays humans may be faced with. Support vector regression (SVR) is a way to build a regression model which is an incredible member of the machine learning family. SVR has been proven to be an effective tool in real-value function estimation. As a supervised-learning appr...
متن کاملAn Efficient Genetic Algorithm for Task Scheduling on Heterogeneous Computing Systems Based on TRIZ
An efficient assignment and scheduling of tasks is one of the key elements in effective utilization of heterogeneous multiprocessor systems. The task scheduling problem has been proven to be NP-hard is the reason why we used meta-heuristic methods for finding a suboptimal schedule. In this paper we proposed a new approach using TRIZ (specially 40 inventive principles). The basic idea of thi...
متن کاملOutlier detection for high dimensional data pdf
Is particularly useful for high dimensional data where outliers cannot be found.High dimensional data in Euclidean space pose special challenges to data. In about just the last few years, the task of unsupervised outlier detection has found.Outlier detection is an outstanding data mining task referred to open pdf with mac word class="text" href="https://tokiqivy.files.wordpress.com/2015/06/opel...
متن کامل